Scaling Analysis of Affinity Propagation
نویسندگان
چکیده
We analyze and exploit some scaling properties of the affinity propagation (AP) clustering algorithm proposed by Frey and Dueck [Science 315, 972 (2007)]. Following a divide and conquer strategy we setup an exact renormalization-based approach to address the question of clustering consistency, in particular, how many cluster are present in a given data set. We first observe that the divide and conquer strategy, used on a large data set hierarchically reduces the complexity O(N2) to O(N((h+2)/(h+1))) , for a data set of size N and a depth h of the hierarchical strategy. For a data set embedded in a d -dimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2 . In fact, for d larger than 2 the relative loss in precision scales such as N((2-d)/(h+1)d). Finally, under some conditions we observe that there is a value s* of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for ss*) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position, as a result of an exact decimation procedure. From this observation, a strategy based on AP can be defined to find out how many clusters are present in a given data set.
منابع مشابه
A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms
Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...
متن کاملPartition Affinity Propagation for Clustering Large Scale of Data in Digital Library
Data clustering is very useful in helping users visit the large scale of data in digit library. In this paper, we present an improved algorithm for clustering large scale of data set with dense relationship based on Affinity Propagation. First, the input data are divided into several groups and Affinity Propagation is applied to them respectively. Results from first step are grouped together in...
متن کاملBeyond Affinity Propagation: Message Passing Algorithms for Clustering
Beyond Affinity Propagation: Message Passing Algorithms for Clustering Inmar-Ella Givoni Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2012 Affinity propagation is an exemplar-based clustering method that takes as input similarities between data points. It outputs a set of data points that best represent the data (exemplars), and assignments of each non-exem...
متن کاملHomoplasmic Stability and Cytoplasmic Inheritence of DARPin G3 Scaffold Protein in Generative and Vegetative Propagation of Transplastoic Tobacco Plants
Plastid engineering gives numerous benefits for the next generation of transgenic technology, consisting of the convenient use of transgene stacking and the production of high expression levels of recombinant proteins. Designed ankyrin repeat proteins (DARPin) are relatively small non-immunoglobulin scaffold proteins that bind to their specific target with high affinity. The G3 is a type of DAR...
متن کاملBiclustering of Expression Microarray Data Using Affinity Propagation
Biclustering, namely simultaneous clustering of genes and samples, represents a challenging and important research line in the expression microarray data analysis. In this paper, we investigate the use of Affinity Propagation, a popular clustering method, to perform biclustering. Specifically, we cast Affinity Propagation into the Couple Two Way Clustering scheme, which allows to use a clusteri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Physical review. E, Statistical, nonlinear, and soft matter physics
دوره 81 6 Pt 2 شماره
صفحات -
تاریخ انتشار 2010